We compare three methods described in the article:

  • HPSS masking method
  • Wavelet approach based on Nongpiur's work
  • Our proposed IS³ method

We first present examples from the generated test set, followed by examples from real-life recordings.

Examples from the generated test set¶

The first few examples are from the test set generated following the same pipeline as the training and validation dataset used in the paper.

Example 1.¶

In [10]:
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, Audio
In [11]:
from rendering.is3.dataloader_numpy import ImpulsiveStationarySeparation

sr = 44100

dataset = ImpulsiveStationarySeparation()

We first sample one example from the test set composed of:

  • a stationary background track
  • an impulses track
  • a mixture track

The mixture track is used as input of the three methods. The three methods are then applied to the mixture track to separate the stationary and impulsive components.

In [12]:
bkg, impulse, mix, gain, norm_gain = dataset.read_scene(
    scene_index=1368, subset="test", dataset="random")
No description has been provided for this image
Background
Your browser does not support the audio element.
Impulses
Your browser does not support the audio element.
Mix
Your browser does not support the audio element.

HPSS with a margin of 1¶

We first apply the HPSS decomposition masking method with a margin parameter equal to 1.

In [13]:
# HPSS
from rendering.is3.baselines import hpss

hpss_module = hpss.HarmonicPercussiveDecomposition(
    nfft=2048,
    window_size=2048,
    overlap=0.75,
    margin=1.
)

y_p, y_h, _, _ = hpss_module.forward(mix)


print("HPSS/Impulses")
display(Audio(y_p, rate=sr))

print("HPSS/Stationary Background")
display(Audio(y_h, rate=sr))
HPSS/Impulses
Your browser does not support the audio element.
HPSS/Stationary Background
Your browser does not support the audio element.

Important leakage of both stationary and impulsive components can be heard in the separated tracks. The resonance of the impulsive sound is poorly separated from the background resulting in a dry sound for the impulsive track.

HPSS with a margin of 2¶

We apply the same method with a greater margin parameter equal to 2 to enhance the separation between impulsive sounds and stationary sounds.

In [14]:
# HPSS

hpss_module_2 = hpss.HarmonicPercussiveDecomposition(
    nfft=2048,
    window_size=2048,
    overlap=0.75,
    margin=2.
)

y_p_2, y_h_2, _, _ = hpss_module_2.forward(mix)


print("HPSS/Impulses")
display(Audio(y_p_2, rate=sr))

print("HPSS/Stationary Background")
display(Audio(y_h_2, rate=sr))
HPSS/Impulses
Your browser does not support the audio element.
HPSS/Stationary Background
Your browser does not support the audio element.

The leakage from the background is reduced on the impulsive track, but the stationary track still contains some impulsive sounds.

Wavelet filtering¶

We now apply the wavelet filtering method based on Nongpiur's work, with our added modifications to predict also an impulsive track.

In [15]:
from rendering.is3.baselines import wavelet_script

wavelet_module = wavelet_script.WaveletBaseline(
    wavelet="db",
    level=13,
    sr=sr,
    ks=2.,
    ks_impulse=6.,
    kc=1.,
    kernel_size=1025,
)

wavelet_bkg, wavelet_impulse = wavelet_module.forward(mix)

print("Wavelet/Impulses")
display(Audio(wavelet_impulse, rate=sr))

print("Wavelet/Stationary Background")
display(Audio(wavelet_bkg, rate=sr))
Wavelet/Impulses
Your browser does not support the audio element.
Wavelet/Stationary Background
Your browser does not support the audio element.

We obtain poor results on the impulsive track with some audio artefacts. Moreover, on the stationary track the original method proposed in the Nongpiur’s article only attenuates the impulsive sounds, which are still present.

Note: It's important to remember that the choice of parameters in this wavelet approach is particularly dependent on the type of impulses and the type of ambient sound (speech in the original article). A search for parameters more suited to the context of our article has been carried out, but the wide variety of sound types and acoustic scenes we study means that this approach performs very unevenly from one example to another.

Proposed system IS³¶

Finally, we apply our proposed IS³ method to the mixture track.

In [16]:
from rendering.is3.model_wrapper import ModelWrapper
import torch

model = ModelWrapper(
    conf_name="014",
    job_id=None,
)
_ = model.eval()

y_i, y_s = model.forward(torch.tensor(mix).reshape(1, -1))

print("IS3/Impulses")
display(Audio(y_i[0].detach().numpy(), rate=sr))

print("IS3/Stationary Background")
display(Audio(y_s[0].detach().numpy(), rate=sr))
IS3/Impulses
Your browser does not support the audio element.
IS3/Stationary Background
Your browser does not support the audio element.

The separation process is a lot improved here with no leakage from one part on the other. There is simply a slight attenuation of the resonance of impulsive sounds, which sound a little drier than in the target track.

In [17]:
fig, axs = plt.subplots(5, 1, figsize=(15, 12), sharex=True, sharey=True)
fig.suptitle('Comparison of Impulse Separation Methods')

# Plot target impulse
axs[0].plot(impulse)
axs[0].set_title('Target Impulse')
axs[0].set_ylabel('Amplitude')

# Plot HPSS (margin=1) impulse
axs[1].plot(y_p)
axs[1].set_title('HPSS (margin=1) Impulse')
axs[1].set_ylabel('Amplitude')

# Plot HPSS (margin=2) impulse
axs[2].plot(y_p_2)
axs[2].set_title('HPSS (margin=2) Impulse')
axs[2].set_ylabel('Amplitude')

# Plot Wavelet impulse
axs[3].plot(wavelet_impulse)
axs[3].set_title('Wavelet Impulse')
axs[3].set_ylabel('Amplitude')

# Plot IS3 impulse
axs[4].plot(y_i[0].detach().numpy())
axs[4].set_title('IS³ Impulse')
axs[4].set_ylabel('Amplitude')
axs[4].set_xlabel('Sample')

plt.tight_layout()
plt.show()
No description has been provided for this image
In [18]:
fig, axs = plt.subplots(5, 1, figsize=(15, 12), sharex=True, sharey=True)
fig.suptitle('Comparison of Stationary/Background Separation Methods')

# Plot target background
axs[0].plot(bkg)
axs[0].set_title('Target Background')
axs[0].set_ylabel('Amplitude')

# Plot HPSS (margin=1) background
axs[1].plot(y_h)
axs[1].set_title('HPSS (margin=1) Background')
axs[1].set_ylabel('Amplitude')

# Plot HPSS (margin=2) background
axs[2].plot(y_h_2)
axs[2].set_title('HPSS (margin=2) Background')
axs[2].set_ylabel('Amplitude')

# Plot Wavelet background
axs[3].plot(wavelet_bkg)
axs[3].set_title('Wavelet Background')
axs[3].set_ylabel('Amplitude')

# Plot IS3 background
axs[4].plot(y_s[0].detach().numpy())
axs[4].set_title('IS³ Background')
axs[4].set_ylabel('Amplitude')
axs[4].set_xlabel('Sample')

plt.tight_layout()
plt.show()
No description has been provided for this image